Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 187
Filtrar
1.
BMC Bioinformatics ; 25(1): 101, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448845

RESUMO

PURPOSE: The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS: GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS: The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION: GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.


Assuntos
Gerenciamento de Dados , Mineração de Dados , Bases de Dados Factuais , Descoberta do Conhecimento , PubMed
2.
Comput Biol Med ; 169: 107810, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38134749

RESUMO

Non-silent single nucleotide genetic variants, like nonsense changes and insertion-deletion variants, that affect protein function and length substantially are prevalent and are frequently misclassified. The low sensitivity and specificity of existing variant effect predictors for nonsense and indel variations restrict their use in clinical applications. We propose the Pathogenic Mutation Prediction (PMPred) method to predict the pathogenicity of single nucleotide variations, which impair protein function by prematurely terminating a protein's elongation during its synthesis. The prediction starts by monitoring functional effects (Gene Ontology annotation changes) of the change in sequence, using an existing ensemble machine learning model (UniGOPred). This, in turn, reveals the mutations that significantly deviate functionally from the wild-type sequence. We have identified novel harmful mutations in patient data and present them as motivating case studies. We also show that our method has increased sensitivity and specificity compared to state-of-the-art, especially in single nucleotide variations that produce large functional changes in the final protein. As further validation, we have done a comparative docking study on such a variation that is misclassified by existing methods and, using the altered binding affinities, show how PMPred can correctly predict the pathogenicity when other tools miss it. PMPred is freely accessible as a web service at https://pmpred.kansil.org/, and the related code is available at https://github.com/kansil/PMPred.


Assuntos
Exoma , Descoberta do Conhecimento , Humanos , Sequenciamento do Exoma , Mutação , Nucleotídeos , Biologia Computacional/métodos
3.
Artigo em Inglês | MEDLINE | ID: mdl-38083156

RESUMO

Discovering knowledge and effectively predicting target events are two main goals of medical text mining. However, few models can achieve them simultaneously. In this study, we investigated the possibility of discovering knowledge and predicting diagnosis at once via raw medical text. We proposed the Enhanced Neural Topic Model (ENTM), a variant of the neural topic model, to learn interpretable representations. We introduced the auxiliary loss set to improve the effectiveness of learned representations. Then, we used learned representations to train a softmax regression model to predict target events. As each element in representations learned by the ENTM has an explicit semantic meaning, weights in softmax regression represent potential knowledge of whether an element is a significant factor in predicting diagnosis. We adopted two independent medical text datasets to evaluate our ENTM model. Results indicate that our model performed better than the latest pretrained neural language models. Meanwhile, analysis of model parameters indicates that our model has the potential discover knowledge from data.Clinical relevance- This work provides a model that can effectively predict patient diagnosis and has the potential to discover knowledge from medical text.


Assuntos
Descoberta do Conhecimento , Redes Neurais de Computação , Humanos , Aprendizagem , Idioma , Semântica
4.
BMC Bioinformatics ; 24(1): 412, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37915001

RESUMO

BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.


Assuntos
Algoritmos , Neoplasias , Humanos , PubMed , Conhecimento , Descoberta do Conhecimento
5.
J Chem Inf Model ; 63(21): 6569-6586, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37883649

RESUMO

Web ontologies are important tools in modern scientific research because they provide a standardized way to represent and manage web-scale amounts of complex data. In chemistry, a semantic database for chemical species is indispensable for its ability to interrelate and infer relationships, enabling a more precise analysis and prediction of chemical behavior. This paper presents OntoSpecies, a web ontology designed to represent chemical species and their properties. The ontology serves as a core component of The World Avatar knowledge graph chemistry domain and includes a wide range of identifiers, chemical and physical properties, chemical classifications and applications, and spectral information associated with each species. The ontology includes provenance and attribution metadata, ensuring the reliability and traceability of data. Most of the information about chemical species are sourced from PubChem and ChEBI data on the respective compound Web pages using a software agent, making OntoSpecies a comprehensive semantic database of chemical species able to solve novel types of problems in the field. Access to this reliable source of chemical data is provided through a SPARQL end point. The paper presents example use cases to demonstrate the contribution of OntoSpecies in solving complex tasks that require integrated semantically searchable chemical data. The approach presented in this paper represents a significant advancement in the field of chemical data management, offering a powerful tool for representing, navigating, and analyzing chemical information to support scientific research.


Assuntos
Descoberta do Conhecimento , Software , Reprodutibilidade dos Testes , Bases de Dados Factuais , Semântica
6.
J Biomed Inform ; 145: 104464, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37541406

RESUMO

OBJECTIVE: We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS: We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS: We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION: Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY: Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.


Assuntos
Doença de Alzheimer , Descoberta do Conhecimento , Humanos , Descoberta do Conhecimento/métodos , Doença de Alzheimer/diagnóstico , Redes Neurais de Computação , Aprendizagem , Fenótipo
7.
Exp Mol Pathol ; 132-133: 104867, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37634863

RESUMO

Mast cells (MCs) are tissue-resident innate immune cells that express the high-affinity receptor for immunoglobulin E and are responsible for host defense and an array of diseases related to immune system. We aimed in this study to characterize the pathways and gene signatures of human cord blood-derived MCs (hCBMCs) in comparison to cells originating from CD34- progenitors using next-generation knowledge discovery methods. CD34+ cells were isolated from human umbilical cord blood using magnetic activated cell sorting and differentiated into MCs with rhIL-6 and rhSCF supplementation for 6-8 weeks. The purity of hCBMCs was analyzed by flow cytometry exhibiting the surface markers CD117+CD34-CD45-CD23-FcεR1αdim. Total RNA from hCBMCs and CD34- cells were isolated and hybridized using microarray. Differentially expressed genes were analyzed using iPathway Guide and Pre-Ranked Gene Set Enrichment Analysis. Next-generation knowledge discovery platforms revealed MC-specific gene signatures and molecular pathways enriched in hCBMCs and pertain the immunological response repertoire.


Assuntos
Sangue Fetal , Mastócitos , Humanos , Descoberta do Conhecimento , Antígenos CD34/genética , Diferenciação Celular/genética
8.
Environ Sci Pollut Res Int ; 30(42): 95155-95171, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37597148

RESUMO

This paper aims at analyzing the research productivity and scientific knowledge discovery of the COVID-19 pandemic in agriculture using a bibliometric analysis approach. A total of 1514 research papers indexed in the Scopus database, covering a period of 2020 to 2022, are processed using VOSviewer and R-Studio software. The analysis of research productivity indicates that the number of research publications on COVID-19 and agriculture has exponentially increased globally, and about 80% of the research papers have been published in the top 10 countries led by the USA, India, and China. The countries are increasingly collaborating in undertaking research on COVID-19 and agriculture. Furthermore, major journals and articles with citations have been extracted to analyze the leading publication avenues and focused areas of research. The science mapping is done using co-occurrence and thematic map. With the help of co-occurrence analysis, six clusters are identified depicting major research themes, i.e., COVID-19 and agricultural supply chain disruption, COVID-19 and human health issues and coping strategies, COVID-19 and non-human and animal health, COVID-19 pandemic and environment and pollution, COVID-19 and healthcare and treatment, and COVID-19 and food nutrition from dairy and meat products. The thematic map analysis identifies potential research areas such as mental health, anxiety, and depression in the agricultural system, which may help in setting future research agenda and help devising policy supports for managing the agriculture sector better during crisis. The paper also highlights the theoretical and practical implications.


Assuntos
COVID-19 , Descoberta do Conhecimento , Animais , Humanos , Pandemias , Bibliometria , Agricultura
9.
IEEE J Biomed Health Inform ; 27(10): 5099-5109, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37498763

RESUMO

Difficulty in knowledge validation is a significant hindrance to knowledge discovery via data mining, especially automatic validation without artificial participation. In the field of medical research, medical knowledge discovery from electronic medical records is a common medical data mining method, but it is difficult to validate the discovered medical knowledge without the participation of medical experts. In this article, we propose a data-driven medical knowledge discovery closed-loop pipeline based on interpretable machine learning and deep learning; the components of the pipeline include Data Generator, Medical Knowledge Mining, Medical Knowledge Evaluation, and Medical Knowledge Application. In addition to completing the discovery of medical knowledge, the pipeline can also automatically validate the knowledge. We apply our pipeline's discovered medical knowledge to a traditional prognostic predictive model of heart failure in a real-world study, demonstrating that the incorporation of medical knowledge can effectively improve the performance of the traditional model. We also construct a scale model based on the discovered medical knowledge and demonstrate that it achieves good performance. To guarantee its medical effectiveness, every process of our pipeline involves the participation of medical experts.


Assuntos
Inteligência Artificial , Descoberta do Conhecimento , Humanos , Aprendizado de Máquina , Mineração de Dados/métodos , Prognóstico
10.
JMIR Mhealth Uhealth ; 11: e42750, 2023 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-37379057

RESUMO

BACKGROUND: Over the past few decades, there has been a rapid increase in the number of wearable sleep trackers and mobile apps in the consumer market. Consumer sleep tracking technologies allow users to track sleep quality in naturalistic environments. In addition to tracking sleep per se, some sleep tracking technologies also support users in collecting information on their daily habits and sleep environments and reflecting on how those factors may contribute to sleep quality. However, the relationship between sleep and contextual factors may be too complex to be identified through visual inspection and reflection. Advanced analytical methods are needed to discover new insights into the rapidly growing volume of personal sleep tracking data. OBJECTIVE: This review aimed to summarize and analyze the existing literature that applies formal analytical methods to discover insights in the context of personal informatics. Guided by the problem-constraints-system framework for literature review in computer science, we framed 4 main questions regarding general research trends, sleep quality metrics, contextual factors considered, knowledge discovery methods, significant findings, challenges, and opportunities of the interested topic. METHODS: Web of Science, Scopus, ACM Digital Library, IEEE Xplore, ScienceDirect, Springer, Fitbit Research Library, and Fitabase were searched to identify publications that met the inclusion criteria. After full-text screening, 14 publications were included. RESULTS: The research on knowledge discovery in sleep tracking is limited. More than half of the studies (8/14, 57%) were conducted in the United States, followed by Japan (3/14, 21%). Only a few of the publications (5/14, 36%) were journal articles, whereas the remaining were conference proceeding papers. The most used sleep metrics were subjective sleep quality (4/14, 29%), sleep efficiency (4/14, 29%), sleep onset latency (4/14, 29%), and time at lights off (3/14, 21%). Ratio parameters such as deep sleep ratio and rapid eye movement ratio were not used in any of the reviewed studies. A dominant number of the studies applied simple correlation analysis (3/14, 21%), regression analysis (3/14, 21%), and statistical tests or inferences (3/14, 21%) to discover the links between sleep and other aspects of life. Only a few studies used machine learning and data mining for sleep quality prediction (1/14, 7%) or anomaly detection (2/14, 14%). Exercise, digital device use, caffeine and alcohol consumption, places visited before sleep, and sleep environments were important contextual factors substantially correlated to various dimensions of sleep quality. CONCLUSIONS: This scoping review shows that knowledge discovery methods have great potential for extracting hidden insights from a flux of self-tracking data and are considered more effective than simple visual inspection. Future research should address the challenges related to collecting high-quality data, extracting hidden knowledge from data while accommodating within-individual and between-individual variations, and translating the discovered knowledge into actionable insights.


Assuntos
Descoberta do Conhecimento , Aplicativos Móveis , Humanos , Estados Unidos , Exercício Físico , Monitores de Aptidão Física , Sono
11.
Cambios rev. méd ; 22(1): e883, 30 Junio 2023. ilus
Artigo em Espanhol | LILACS | ID: biblio-1451949

RESUMO

INTRODUCCIÓN. La epistemología, rama de la filosofía que estudia el proceso de investigación y su producto el conocimiento científico, implica ámbitos de la ciencia con enfoque positivismo y postpositivismo, interpretativismo, teoría crítica; y, transcomplejo, cada uno de ellos con los elementos paradigmáticos de: ontología, epistemología y metodología, su conocimiento y aplicabilidad en los diferentes ámbitos es fundamental porque sus enfoques generan ciencia. OBJETIVO. Desarrollar capacidades intelectuales en bases contextuales y teóricas en epistemología de la investigación social, indispensables para el ejercicio profesional en el ámbito de la investigación científica y del conocimiento científico. MATERIALES Y MÉTODOS. Estudio observacional, descriptivo, con población y muestra conocida de 30 modalidades de publicación, periodo junio a julio 2020. Los criterios de inclusión fueron: fuentes secundarias de información bibliográfica validadas en el ámbito de las ciencias sociales. La técnica de observación fue en buscadores bibliográficos PUBMED, Scielo, Scopus, Diccionario de Descriptores en Ciencias de la Salud y la Real Academia Española. La tarea de revisar la literatura de investigación comprendió la identificación, selección, análisis crítico, descripción escrita, interpretación, discusión y conclusión de la información que existe sobre la epistemología de la investigación social, tema, que se registró con aplicación de un gestor de referencias bibliográficas, tipo Microsoft Word. RESULTADOS. Se logró obtener capacidades intelectuales al estructurar la cronológica de la epistemología de la investigación social, del conocimiento científico y nuevas perspectivas para el ejercicio profesional en el ámbito de la investigación científica. CONCLUSIÓN. Las perspectivas se orientan a integrar paradigmas pasados y futuros con visión de transcomplejidad, espacios organológicos de una gran red, conformación de cibercomunidades de investigación, uso de método integrador, nuevo lenguaje en equipos multidisciplinarios, agentes como el foco principal de la teoridad epistémica en espacio, tiempo y la relación entre las cosas.


INTRODUCTION. Epistemology, a branch of philosophy that studies the research process and its product, scientific knowledge, involves areas of science focussed in with positivism and postpositivism, interpretivism, critical theory; and, transcomplex, each one of them with the paradigmatic elements of: ontology, epistemology and methodology, their knowledge and applicability in the different fields is fundamental because their approaches generate science. OBJECTIVE. Develop intellectual capacities on contextual and theoretical bases in the epistemology of social research, essential for professional practice in the field of scientific research and scientific knowledge. MATERIALS AND METHODS. Observational, descriptive study, with population and a known sample of 30 publication modalities, period June to July 2020. The inclusion criteria were: secondary sources of bibliographic information validated in the field of social sciences. The observation technique was in bibliographic search engines PUBMED, Scielo, Scopus, Dictionary of Descriptors in Health Sciences and the Royal Spanish Academy. The task of reviewing the research literature included the identification, selection, critical analysis, written description, interpretation, discussion and conclusion of the information that exists on the epistemology of social research, subject, which was registered with the application of a reference manager bibliographic, Microsoft Word type. RESULTS. Intellectual capacities were obtained by structuring the chronology of the epistemology of social research, scientific knowledge and new perspectives for professional practice in the field of scientific research. CONCLUSION. The perspectives are aimed at integrating past and future paradigms with a vision of transcomplexity, organological spaces of a large network, formation of research cyber communities, use of integrative method, new language in multidisciplinary teams, agents as the main focus of epistemic theory in space, time and the relationship between things.


Assuntos
Pensamento/classificação , Ciência Cognitiva , Pesquisa Interdisciplinar , Descoberta do Conhecimento , Validade Social em Pesquisa , Aprendizado Social , Filosofia Médica , Formação de Conceito/classificação , Conhecimento , Equador , Gestão do Conhecimento
12.
J Biomed Inform ; 143: 104362, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37146741

RESUMO

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Redes Neurais de Computação , Descoberta do Conhecimento/métodos , Publicações
13.
J Biomed Inform ; 142: 104383, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37196989

RESUMO

OBJECTIVE: To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests, using text-mining over scientific literature, and demonstrate the effectiveness of these knowledge bases in hypothesis generation and literature-based discovery (LBD). METHODS: We propose a lightweight process using an extractive search framework to create ad-hoc knowledge bases, which require minimal training and no background in bio-curation or computer science. These knowledge bases are particularly effective for LBD and hypothesis generation using Swanson's ABC method. The personalized nature of the knowledge bases allows for a somewhat higher level of noise than "public facing" ones, as researchers are expected to have prior domain experience to separate signal from noise. Fact verification is shifted from exhaustive verification of the knowledge base to post-hoc verification of specific entries of interest, allowing researchers to assess the correctness of relevant knowledge base entries by considering the paragraphs in which the facts were introduced. RESULTS: We demonstrate the methodology by constructing several knowledge bases of different kinds: three knowledge bases that support lab-internal hypothesis generation: Drug Delivery to Ovarian Tumors (DDOT); Tissue Engineering and Regeneration; Challenges in Cancer Research; and an additional comprehensive, accurate knowledge base designated as a public resource for the wider community on the topic of Cell Specific Drug Delivery (CSDD). In each case, we show the design and construction process, along with relevant visualizations for data exploration, and hypothesis generation. For CSDD and DDOT we also show meta-analysis, human evaluation, and in vitro experimental evaluation. CONCLUSION: Our approach enables researchers to create personalized, lightweight knowledge bases for specialized scientific interests, effectively facilitating hypothesis generation and literature-based discovery (LBD). By shifting fact verification efforts to post-hoc verification of specific entries, researchers can focus on exploring and generating hypotheses based on their expertise. The constructed knowledge bases demonstrate the versatility and adaptability of our approach to versatile research interests. The web-based platform, available at https://spike-kbc.apps.allenai.org, provides researchers with a valuable tool for rapid construction of knowledge bases tailored to their needs.


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Humanos , Mineração de Dados/métodos , Descoberta do Conhecimento/métodos , Publicações
14.
Eur J Epidemiol ; 38(6): 605-615, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37099244

RESUMO

Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools.


Assuntos
Conjuntos de Dados como Assunto , Descoberta do Conhecimento , Humanos , Padrões de Referência
15.
PLoS One ; 18(4): e0283933, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37018292

RESUMO

Recently, the utilization of real-world medical data collected from clinical sites has been attracting attention. Especially as the number of variables in real-world medical data increases, causal discovery becomes more and more effective. On the other hand, it is necessary to develop new causal discovery algorithms suitable for small data sets for situations where sample sizes are insufficient to detect reasonable causal relationships, such as rare diseases and emerging infectious diseases. This study aims to develop a new causal discovery algorithm suitable for a small number of real-world medical data using quantum computing, one of the emerging information technologies attracting attention for its application in machine learning. In this study, a new algorithm that applies the quantum kernel to a linear non-Gaussian acyclic model, one of the causal discovery algorithms, is developed. Experiments on several artificial data sets showed that the new algorithm proposed in this study was more accurate than existing methods with the Gaussian kernel under various conditions in the low-data regime. When the new algorithm was applied to real-world medical data, a case was confirmed in which the causal structure could be correctly estimated even when the amount of data was small, which was not possible with existing methods. Furthermore, the possibility of implementing the new algorithm on real quantum hardware was discussed. This study suggests that the new proposed algorithm using quantum computing might be a good choice among the causal discovery algorithms in the low-data regime for novel medical knowledge discovery.


Assuntos
Metodologias Computacionais , Descoberta do Conhecimento , Teoria Quântica , Algoritmos , Modelos Lineares
16.
Zhongguo Zhong Yao Za Zhi ; 48(6): 1682-1690, 2023 Mar.
Artigo em Chinês | MEDLINE | ID: mdl-37005856

RESUMO

This study aimed to explore the underlying framework and data characteristics of Tibetan prescription information. The information on Tibetan medicine prescriptions was collected based on 11 Tibetan medicine classics, such as Four Medical Canons(Si Bu Yi Dian). The optimal classification method was used to summarize the information structure of Tibetan medicine prescriptions and sort out the key problems and solutions in data collection, standardization, translation, and analysis. A total of 11 316 prescriptions were collected, involving 139 011 entries and 63 567 pieces of efficacy information of drugs in prescriptions. The information on Tibe-tan medicine prescriptions could be summarized into a "seven-in-one" framework of "serial number-source-name-composition-efficacy-appendix-remarks" and 18 expansion layers, which contained all information related to the inheritance, processing, origin, dosage, semantics, etc. of prescriptions. Based on the framework, this study proposed a "historical timeline" method for mining the origin of prescription inheritance, a "one body and five layers" method for formulating prescription drug specifications, a "link-split-link" method for constructing efficacy information, and an advanced algorithm suitable for the research of Tibetan prescription knowledge discovery. Tibetan medicine prescriptions have obvious characteristics and advantages under the guidance of the theories of "three factors", "five sources", and "Ro-nus-zhu-rjes" of Tibetan medicine. Based on the characteristics of Tibetan medicine prescriptions, this study proposed a multi-level and multi-attribute underlying data architecture, providing new methods and models for the construction of Tibetan medicine prescription information database and knowledge discovery and improving the consistency and interoperability of Tibetan medicine prescription information with standards at all levels, which is expected to realize the "ancient and modern connection-cleaning up the source-data sharing", so as to promote the informatization and modernization research path of Tibetan medicine prescriptions.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Tibetana , Descoberta do Conhecimento , Prescrições de Medicamentos , Bases de Dados Factuais , Algoritmos , Medicina Tradicional Chinesa , Medicamentos de Ervas Chinesas/uso terapêutico
17.
BMC Bioinformatics ; 23(Suppl 9): 570, 2023 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-36918777

RESUMO

BACKGROUND: Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of [Formula: see text] and [Formula: see text] relations can be simply connected to deduce [Formula: see text]. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject[Formula: see text](predicate)[Formula: see text]object triples as the [Formula: see text] relations, but too many proposed connections remain for manual verification. RESULTS: Based on the hypothesis that only a small number of subject-predicate-object triples extracted from a publication represent the paper's novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset-making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper-to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards. CONCLUSIONS: The quantity of proposed knowledge pairs is reduced by a factor of [Formula: see text], and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.


Assuntos
Descoberta do Conhecimento , Conhecimento
18.
Soc Sci Res ; 110: 102817, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36796993

RESUMO

The interdisciplinary field of knowledge discovery and data mining emerged from a necessity of big data requiring new analytical methods beyond the traditional statistical approaches to discover new knowledge from the data mine. This emergent approach is a dialectic research process that is both deductive and inductive. The data mining approach automatically or semi-automatically considers a larger number of joint, interactive, and independent predictors to address causal heterogeneity and improve prediction. Instead of challenging the conventional model-building approach, it plays an important complementary role in improving model goodness of fit, revealing valid and significant hidden patterns in data, identifying nonlinear and non-additive effects, providing insights into data developments, methods, and theory, and enriching scientific discovery. Machine learning builds models and algorithms by learning and improving from data when the explicit model structure is unclear and algorithms with good performance are difficult to attain. The most recent development is to incorporate this new paradigm of predictive modeling with the classical approach of parameter estimation regressions to produce improved models that combine explanation and prediction.


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Humanos , Mineração de Dados/métodos , Aprendizado de Máquina
19.
J Biomed Inform ; 139: 104318, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36781035

RESUMO

Causal relation extraction of biomedical entities is one of the most complex tasks in biomedical text mining, which involves two kinds of information: entity relations and entity functions. One feasible approach is to take relation extraction and function detection as two independent sub-tasks. However, this separate learning method ignores the intrinsic correlation between them and leads to unsatisfactory performance. In this paper, we propose a joint learning model, which combines entity relation extraction and entity function detection to exploit their commonality and capture their inter-relationship, so as to improve the performance of biomedical causal relation extraction. Experimental results on the BioCreative-V Track 4 corpus show that our joint learning model outperforms the separate models in BEL statement extraction, achieving the F1 scores of 57.0% and 37.3% on the test set in Stage 2 and Stage 1 evaluations, respectively. This demonstrates that our joint learning system reaches the state-of-the-art performance in Stage 2 compared with other systems.


Assuntos
Mineração de Dados , Aprendizado de Máquina , Mineração de Dados/métodos , Descoberta do Conhecimento
20.
Math Biosci Eng ; 20(1): 837-858, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36650791

RESUMO

Craniotomy is an invasive operation with great trauma and many complications, and patients undergoing craniotomy should enter the ICU for monitoring and treatment. Based on electronic medical records (EMR), the discovery of high-risk multi-biomarkers rather than a single biomarker that may affect the length of ICU stay (LoICUS) can provide better decision-making or intervention suggestions for clinicians in ICU to reduce the high medical expenses of these patients and the medical burden as much as possible. The multi-biomarkers or medical decision rules can be discovered according to some interpretable predictive models, such as tree-based methods. Our study aimed to develop an interpretable framework based on real-world EMRs to predict the LoICUS and discover some high-risk medical rules of patients undergoing craniotomy. The EMR datasets of patients undergoing craniotomy in ICU were separated into preoperative and postoperative features. The paper proposes a framework called Rules-TabNet (RTN) based on the datasets. RTN is a rule-based classification model. High-risk medical rules can be discovered from RTN, and a risk analysis process is implemented to validate the rules discovered by RTN. The performance of the postoperative model was considerably better than that of the preoperative model. The postoperative RTN model had a better performance compared with the baseline model and achieved an accuracy of 0.76 and an AUC of 0.85 for the task. Twenty-four key decision rules that may have impact on the LoICUS of patients undergoing craniotomy are discovered and validated by our framework. The proposed postoperative RTN model in our framework can precisely predict whether the patients undergoing craniotomy are hospitalized for too long (more than 15 days) in the ICU. We also discovered and validated some key medical decision rules from our framework.


Assuntos
Registros Eletrônicos de Saúde , Descoberta do Conhecimento , Humanos , Complicações Pós-Operatórias/etiologia , Complicações Pós-Operatórias/terapia , Unidades de Terapia Intensiva , Craniotomia/efeitos adversos , Craniotomia/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...